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1 . 0  Introduction 


One  problem  of  concern  in  climato]ogy  studies  is  that  of  extending 
climatology  information  spatially.  Somerville  and  Bean  (1981)  have  given  some 
techniques  for  developing  probability  models  for  visibility  in  West  Germany 
for  locations  where  no  historical  records  exist.  The  current  study  is  a 
consolidation  and  generalization  of  the  basic  theory  for  the  above  problem. 

The  problem  may  be  stated  in  the  following  way.  We  wish  to  estimate  the 
probability  distribution  of  a  climatic  variable,  such  as  visibility,  for  any 
location  in  a  given  region  where  historical  records  do  not  necessarily  exist. 
However,  we  do  assume  that  there  are  independent  variables  which  may  be  measured 
at  the  location  of  interest  that  have  some  correlation  with  the  parameters  in 
the  probability  distribution  to  be  estimated.  These  variables  may  be  the 
elevation,  average  elevation  of  the  surrounding  area,  or  other  geographical 
measures  for  example.  Also,  we  assume  that  we  have  a  sample  of  locations  for 
which  we  have  historical  records.  Further,  we  assume  that  the  distribution  at 
each  location  in  the  region  is  of  the  same  form  but  the  parameters  change  from 
location  to  location.-  The  region  may  be  considered  as  a  collection  or  family 
of  distributions.  This  family  of  distributions  is  indexed  by  a  p-dimensional 
parameter  0.  The  parameter  0  depends  on  an  independent  variable  Z  =  (Z^.Z^,..., 
The  k  components  of  Z  may  be  measures  of  such  attributes  as  elevation,  average 
elevation  of  surround  area,  or  others.  Z^  may  be  taken  to  be  identically  1  to 
allow  for  a  constant  term.  0  is  assumed  to  depend  on  Z  in  the  form 

9  -  ZB 


where 


2 


0  must  be  restricted  to  a  set,  say  ft  c  R^,  on  which  the  distribution  is 
defined.  Thus,  for  Z  contained  in  set  TeR  we  must  restrict  6  to  some  set, 
say  B,  such  that  4)  €  0.  The  family  of  distributions  is  characterised  by 

gr  =  {F  :  9  =  ZB,  2  c  T  and  8  e  B} 

Where  is  the  cumulative  probability  distribution  of  the  climatic  (or  other) 
variable  of  interest  with  parameter  0. 

Our  objective  is  to  estimate  the  entire  family  of  distributions  ,  by 
utilizing  the  dependence  of  9  on  Z.  To  accomplish  this,  a  random  sample  of 
M  distributions  at  M  locations  from  the  family  of  distributions  Is  taken.  Next, 
from  these  M  distributions  a  sample  of  sire  N  is  taken  from  each  distribution. 
That  is,  N  observations  of  the  variable  of  Interest,  aay  X,  are  taken  at  each 
of  the  sampled  locations.  Also,  the  independent  variable  Z  is  taken  at  each  of 
the  sampled  locations.  Thus  at  the  jth  location  the  sample  will  consist  of 

2J1#  Zj2 . Zjk  and  *lj  ’  xlj . *Nj 


or  =  8  Z  +  8,.,Z  „  +  . 

j  12  j]  22  j 2 


+  Bk2Z,k 


(Note:  the  variables  Z  ,  Z  ....  Z  are  the  same  in  the  above  expressions 

J  1  J  2  J  K 

for  p .  and  for  notational  simplicity.  Other  variables  could  be  used  with 
J  j 

only  notational  changes  in  the  estimators). 

For  Pj,  we  find  that  if  we  take  partials  of  (2.1.1)  with  respect 
to  8^,  t>21>  ....  8^^  we  obtain  the  following  system  of  equations. 
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(2.1.2) 


where 


N 

E  X . , /N 


The  system  of  normal  equations  given  in  (2.1.2)  may  be  written  as 


z'ze^  -  Z'X 


Solving  (2.1.3)  we  obtain 


8. x  -  (Z'Z)-1  Z’X  , 


and  thus , 


based  on  the  previous  assumption 
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(2.1.3) 


We  can  obtain  the  normal  equations  to  estimate  the  parameters  in  ohe  model  for 

in  a  similar  way.  If  we  let 
,  N  A 
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and  8  _  = 
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and  thus 


The  above  estimates  are  equivalent  to  estimating  (y  ^ , 

_  2 

(X..,  5^).  Then  to  obtain  the  overall  model,  B.^andS. 
2 

X, .  and  S.  on  Z ,  - ,  Z . „ ,  •  •  • ,  Z . ,  for  j  1,  2,  • • • %  M  • 
*  J  J  J  * 


o^)  with  the  usual  MLE's, 

2  are  estimated  by  regressing 


2.2  Least  Squares  Estimators 

Another  type  of  estimator  which  we  will  refer  to  as  the  least 
squares  estimator  (LSE)  is  an  alternative  to  the  MLE.  This  type  (the  LSE)  is 
useful  for  its  robustness  qualities.  Somerville  and  Bean  (1982)  give  a 
simulation  study  to  illustrate  the  robustness  of  the  LSE.  Parr  and  Schucany 
(1980)  give  a  discussion  of  minimum  distance  (MD)  estimation  which  cover  a 
number  of  estimators  with  robustness  qualities.  One  of  the  most  useful  MD 
estimators  is  one  which  is  obtained  by  minimizing  the  Cramer-von  Mises 
statistic.  This  particular  MD  estimator  is  equivalent  to  the  LSE.  A  form  of 
the  Cramer-von  Mises  distance  or  discrepancy  between  the  empirical  distributions 
and  the  model  distributions  we  define  to  be 

M  N  r  (i)  2i-l  1 2  / 

•  Jf1  i^1[F<x<i)3:  2  8>-^irJ/NM  • 


The  notation 
distribution 


,  .u  .  th 

refers  to  the  l 


(location) . 


largest  observation  at  the  j 


th 


The  LSE  for  B  is  that  value  of  B  1  B  such  that 

d  (B)  =  inf  d  (0)  .  (2.2.1) 

0c  B 

This  estimator  is  found  using  non-linear  regression  methods.  Somerville  and 
Bean  (1981)  illustrate  the  methods  for  solving  the  non-linear  regression 
problem  in  this  context. 


3.0  An  Example  Using  the  LSE 

The  Weibull  distribution  has  been  used  extensively  by  So  /ille  and  Bean 
(1979)  for  various  climatic  variables  including  visibility.  '  probability 
that  visibility  is  less  than  x  miles,  using  the  Weibull  distribution  is  given  by 

x8 

F(x)  -  1  -  e~a  ,  a,  0  >  0 

The  parameters  (a, 8)  vary  from  location  to  location.  Note  that  the  use  of  0 
in  this  context  differs  from  the  previous  sections. 

A  number  of  different  variables  including  some  climatic  and  geographical 
variables  were  investigated,  and  the  LSE  for  (a,0)  using  these  variables  was 
developed  by  Somerville  and  Bean  (1981).  A  future  report  which  uses  more  data 
than  the  above  study  will  show  that  the  LSE  for  (a, 8)  based  on  the  cube  of 
elevation  and  average  elevation  of  the  surrounding  area  provides  a  practical 
model  which  gives  reasonable  fits  to  the  data. 

Let 

3  -9 

Z *  (elevation  in  feet)  '10  , 

3  -9 

Z2  ■  (average  elevation  In  feet)  "10  of  20  location!!  equally 


spaced  a  distance  of  20  km  from  the  location  of  interest. 


H 


-  ‘  b01  +  bll  Z,  +  b21  Z2 
6  '  b02  +  b12  Z]  +  b22  Z2 


Sixty  locations  throughout  West  Germany  were  used  to  obtain  the  LSE,  b,  for 

b  =  (b0j»  bjjf  b2i*  ^02*  b12’  b22^  ^or  various  times  of  day  and  each  month 
of  the  year. 

For  example,  the  LSE  for  the  above  parameters  were  obtined  by  the 
appropriate  form  of  (2.2.1)  for  April  for  hours  10-12.  The  resulting  equations 
are  given  by 


a  =  .0365  +  .0049 Z  -  .0009  Z2 


6  =  1.32  -  .0092  Z  -  .0160  Z2 


The  RMS  defined  by 


-(vty1 


is  .058  based  on  the  sixty  locations  for  April  hours  1000  -  1200. 

Now  suppose  we  wish  to  have  the  probability  distribution  at  Konstanz, 
Germany  at  the  above  mentioned  month  and  time.  The  elevation  of  1368  feet 
yields  /.  (  «  2.56,  and  the  average  elevation  of  1725  yields  =  4.71.  This 
g  i  ves 


F(x)  =  1 


-.045 


,  x  ■  0 


On  the  other  hand,  the  LSE  was  found  to  be  based  on  non-linear  regression 
methods.  The  LSE  has  been  found  to  be  practical  because  of  its  robustness, 
and  even  though  non-linear  regression  is  required,  it  has  been  found  to  be  no 
more  costly  or  difficult  to  calculate  than  the  MLE. 

It  has  been  found  that  the  concept  of  estimating  a  family  of  distributions 
is  applicable  to  extending  probability  distributions  for  climatic  variables 
to  data-void  regions.  Also,  these  concepts  should  prove  to  be  useful  in  other 
areas  of  research. 
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